AITopics | softmax function

Collaborating Authors

softmax function

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Sampled Estimators For Softmax Must Be Biased

Neural Information Processing SystemsJun-11-2026, 22:26:25 GMT

Models requiring probabilistic outputs are ubiquitous and used in fields such as natural language processing, contrastive learning, and recommendation systems. The standard method of designing such a model is to output unconstrained logits, which are normalized into probabilities with the softmax function. The normalization involves computing a summation across all classes, which becomes prohibitively expensive for problems with a large number of classes. An important strategy to reduce the cost is to sum over a sampled subset of classes in the softmax function, known as the sampled softmax. It was known that the sampled softmax is biased; the expectation taken over the sampled classes is not equal to the softmax function. Many works focused on reducing the bias by using a better way of sampling the subset. However, while sampled softmax is biased, it is unclear whether an unbiased function different from sampled softmax exists. In this paper, we show that all functions that only access a sampled subset of classes must be biased. With this result, we prevent efforts in finding unbiased loss functions and validate that past efforts devoted to reducing bias are the best we can do.

artificial intelligence, machine learning, proceedings, (4 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Generalizable Multi-Linear Attention Network

Neural Information Processing SystemsApr-25-2026, 18:55:21 GMT

The majority of existing multimodal sequential learning methods focus on how to obtain powerful individual representations and neglect to effectively capture the multimodal joint representation. Bilinear attention network (BAN) is a commonly used integration method, which leverages tensor operations to associate the features of different modalities. However, BAN has a poor compatibility for more modalities, since the computational complexity of the attention map increases exponentially with the number of modalities. Based on this concern, we propose a new method called generalizable multi-linear attention network (MAN), which can associate more modalities in acceptable complexity with hierarchical approximation decomposition. Specifically, considering the fact that softmax attention kernels cannot be decomposed as linear operation directly, we adopt the addition random features mechanism to approximate the non-linear softmax functions with enough theoretical analysis. Furthermore, we also introduce the local sequential constraints, which can be combined with ARF conveniently, as positional information. We conduct extensive experiments on several datasets of corresponding tasks, the experimental results show that MAN could achieve competitive results compared with baseline methods, showcasing the effectiveness of our contributions.

artificial intelligence, machine learning, natural language, (18 more...)

Neural Information Processing Systems

Genre: Research Report (0.48)

Industry: Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.56)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.49)

Add feedback

Adaptive Sampling for Efficient Softmax Approximation

Neural Information Processing SystemsMar-22-2026, 15:30:43 GMT

The softmax function is ubiquitous in machine learning and optimization applications. Computing the full softmax evaluation of a matrix-vector product can be computationally expensive in high-dimensional settings. In many applications, however, it is sufficient to calculate only the top few outputs of the softmax function. In this work, we present an algorithm, dubbed AdaptiveSoftmax, that adaptively computes the top k softmax values more efficiently than the full softmax computation, with probabilistic guarantees. We demonstrate the sample efficiency improvements afforded by AdaptiveSoftmax on real and synthetic data to corroborate our theoretical results.

artificial intelligence, machine learning, proceedings, (7 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Deep Neural Nets with Interpolating Function as Output Activation

Neural Information Processing SystemsMar-16-2026, 21:56:28 GMT

We replace the output layer of deep neural nets, typically the softmax function, by a novel interpolating function. And we propose end-to-end training and testing algorithms for this new architecture. Compared to classical neural nets with softmax function as output activation, the surrogate with interpolating function as output activation combines advantages of both deep and manifold learning. The new framework demonstrates the following major advantages: First, it is better applicable to the case with insufficient training data. Second, it significantly improves the generalization accuracy on a wide variety of networks. The algorithm is implemented in PyTorch, and the code is available at https://github.com/

artificial intelligence, machine learning, proceedings, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

Adaptive Sampling for Efficient Softmax Approximation

Neural Information Processing SystemsFeb-18-2026, 07:02:32 GMT

The softmax function is ubiquitous in machine learning and optimization applications.

algorithm, artificial intelligence, machine learning, (18 more...)

Neural Information Processing Systems

Country:

North America > United States > California > Santa Clara County > Stanford (0.04)
North America > United States > California > Santa Clara County > Palo Alto (0.04)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)
(3 more...)

Genre:

Research Report > New Finding (0.68)
Research Report > Experimental Study (0.68)

Industry: Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Add feedback

791d3337291b2c574545aeecfa75484c-Paper-Conference.pdf

Neural Information Processing SystemsFeb-15-2026, 02:36:35 GMT

artificial intelligence, estimator, machine learning, (14 more...)

Neural Information Processing Systems

Country:

North America > United States > Massachusetts > Middlesex County > Cambridge (0.14)
Asia > Singapore (0.04)
Asia > Middle East > Jordan (0.04)
Asia > China > Chongqing Province > Chongqing (0.04)

Genre: Research Report (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Robust Bi-Tempered Logistic Loss Based on Bregman Divergences

Ehsan Amid, Manfred K. K. Warmuth, Rohan Anil, Tomer Koren

Neural Information Processing SystemsFeb-12-2026, 21:36:45 GMT

Neural Information Processing Systems http://nips.cc/

divergence, logistic loss, loss function, (14 more...)

Neural Information Processing Systems

Country:

Asia > Middle East > Israel > Tel Aviv District > Tel Aviv (0.04)
North America > United States > Pennsylvania (0.04)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
(4 more...)

Genre: Research Report > New Finding (0.46)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.95)

Add feedback

DropMax: Adaptive Variational Softmax

Hae Beom Lee, Juho Lee, Saehoon Kim, Eunho Yang, Sung Ju Hwang

Neural Information Processing SystemsFeb-12-2026, 15:16:41 GMT

Neural Information Processing Systems http://nips.cc/

classifier, dropmax, probability, (15 more...)

Neural Information Processing Systems

Country:

North America > Canada > Quebec > Montreal (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Asia > South Korea (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)

Add feedback

GeneralizableMulti-LinearAttentionNetwork

Neural Information Processing SystemsFeb-8-2026, 13:26:05 GMT

artificial intelligence, machine learning, transformer, (18 more...)

Neural Information Processing Systems

Country: Asia > China (0.04)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

BalancedMeta-Softmax forLong-TailedVisualRecognition

Neural Information Processing SystemsFeb-7-2026, 22:34:22 GMT

Takinganinstance segmentation dataset, LVIS[7],forexample, the number of instances inbanana class can be thousands of times more than that of abait class.

artificial intelligence, dataset, machine learning, (18 more...)

Neural Information Processing Systems

Country: North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)

Genre: Research Report > New Finding (0.47)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.70)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.46)

Add feedback